Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 7050 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 53 |
| Duplicate rows (%) | 0.8% |
| Total size in memory | 716.1 KiB |
| Average record size in memory | 104.0 B |
Variable types
| Categorical | 4 |
|---|---|
| Numeric | 9 |
| Dataset has 53 (0.8%) duplicate rows | Duplicates |
status_id has a high cardinality: 6997 distinct values | High cardinality |
num_reactions is highly correlated with num_likes | High correlation |
num_likes is highly correlated with num_reactions | High correlation |
num_hahas is highly skewed (γ1 = 20.30574123) | Skewed |
status_id is uniformly distributed | Uniform |
num_reactions has 121 (1.7%) zeros | Zeros |
num_comments has 2119 (30.1%) zeros | Zeros |
num_shares has 3911 (55.5%) zeros | Zeros |
num_likes has 126 (1.8%) zeros | Zeros |
num_loves has 4230 (60.0%) zeros | Zeros |
num_wows has 5308 (75.3%) zeros | Zeros |
num_hahas has 5916 (83.9%) zeros | Zeros |
num_sads has 6443 (91.4%) zeros | Zeros |
num_angrys has 6627 (94.0%) zeros | Zeros |
Reproduction
| Analysis started | 2021-05-02 10:45:36.047497 |
|---|---|
| Analysis finished | 2021-05-02 10:45:45.627808 |
| Duration | 9.58 seconds |
| Software version | pandas-profiling v2.11.0 |
| Download configuration | config.yaml |
| Distinct | 6997 |
|---|---|
| Distinct (%) | 99.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 55.2 KiB |
| 819700534875473_998824716963053 | 2 |
|---|---|
| 819700534875473_957697307742461 | 2 |
| 819700534875473_1000607730118085 | 2 |
| 819700534875473_976401089205416 | 2 |
| 819700534875473_963754250470100 | 2 |
| Other values (6992) |
Length
| Max length | 33 |
|---|---|
| Median length | 31 |
| Mean length | 31.31531915 |
| Min length | 31 |
Characters and Unicode
| Total characters | 220773 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 6944 ? |
|---|---|
| Unique (%) | 98.5% |
Sample
| 1st row | 246675545449582_1649696485147474 |
|---|---|
| 2nd row | 246675545449582_1649426988507757 |
| 3rd row | 246675545449582_1648730588577397 |
| 4th row | 246675545449582_1648576705259452 |
| 5th row | 246675545449582_1645700502213739 |
| Value | Count | Frequency (%) |
| 819700534875473_998824716963053 | 2 | < 0.1% |
| 819700534875473_957697307742461 | 2 | < 0.1% |
| 819700534875473_1000607730118085 | 2 | < 0.1% |
| 819700534875473_976401089205416 | 2 | < 0.1% |
| 819700534875473_963754250470100 | 2 | < 0.1% |
| 819700534875473_993975437447981 | 2 | < 0.1% |
| 819700534875473_957599447752247 | 2 | < 0.1% |
| 819700534875473_968264653352393 | 2 | < 0.1% |
| 819700534875473_999880033524188 | 2 | < 0.1% |
| 819700534875473_965407870304738 | 2 | < 0.1% |
| Other values (6987) | 7030 |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 819700534875473_998824716963053 | 2 | < 0.1% |
| 819700534875473_957697307742461 | 2 | < 0.1% |
| 819700534875473_1000607730118085 | 2 | < 0.1% |
| 819700534875473_976401089205416 | 2 | < 0.1% |
| 819700534875473_963754250470100 | 2 | < 0.1% |
| 819700534875473_993975437447981 | 2 | < 0.1% |
| 819700534875473_957599447752247 | 2 | < 0.1% |
| 819700534875473_968264653352393 | 2 | < 0.1% |
| 819700534875473_999880033524188 | 2 | < 0.1% |
| 819700534875473_965407870304738 | 2 | < 0.1% |
| Other values (6987) | 7030 |
Most occurring characters
| Value | Count | Frequency (%) |
| 5 | 33235 | |
| 4 | 29099 | |
| 8 | 23717 | |
| 1 | 23706 | |
| 6 | 23509 | |
| 2 | 18242 | |
| 7 | 18191 | |
| 3 | 15301 | |
| 0 | 14681 | |
| 9 | 14042 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 213723 | |
| Connector Punctuation | 7050 | 3.2% |
Most frequent character per category
| Value | Count | Frequency (%) |
| 5 | 33235 | |
| 4 | 29099 | |
| 8 | 23717 | |
| 1 | 23706 | |
| 6 | 23509 | |
| 2 | 18242 | |
| 7 | 18191 | |
| 3 | 15301 | |
| 0 | 14681 | |
| 9 | 14042 |
| Value | Count | Frequency (%) |
| _ | 7050 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 220773 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 5 | 33235 | |
| 4 | 29099 | |
| 8 | 23717 | |
| 1 | 23706 | |
| 6 | 23509 | |
| 2 | 18242 | |
| 7 | 18191 | |
| 3 | 15301 | |
| 0 | 14681 | |
| 9 | 14042 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 220773 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 5 | 33235 | |
| 4 | 29099 | |
| 8 | 23717 | |
| 1 | 23706 | |
| 6 | 23509 | |
| 2 | 18242 | |
| 7 | 18191 | |
| 3 | 15301 | |
| 0 | 14681 | |
| 9 | 14042 |
| Distinct | 1067 |
|---|---|
| Distinct (%) | 15.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 230.1171631 |
|---|---|
| Minimum | 0 |
| Maximum | 4710 |
| Zeros | 121 |
| Zeros (%) | 1.7% |
| Memory size | 55.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 17 |
| median | 59.5 |
| Q3 | 219 |
| 95-th percentile | 1239.65 |
| Maximum | 4710 |
| Range | 4710 |
| Interquartile range (IQR) | 202 |
Descriptive statistics
| Standard deviation | 462.6253091 |
|---|---|
| Coefficient of variation (CV) | 2.010390285 |
| Kurtosis | 16.73644703 |
| Mean | 230.1171631 |
| Median Absolute Deviation (MAD) | 52.5 |
| Skewness | 3.738452153 |
| Sum | 1622326 |
| Variance | 214022.1767 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 131 | 1.9% |
| 2 | 124 | 1.8% |
| 0 | 121 | 1.7% |
| 14 | 121 | 1.7% |
| 3 | 116 | 1.6% |
| 12 | 112 | 1.6% |
| 18 | 111 | 1.6% |
| 11 | 106 | 1.5% |
| 10 | 101 | 1.4% |
| 19 | 97 | 1.4% |
| Other values (1057) | 5910 |
| Value | Count | Frequency (%) |
| 0 | 121 | |
| 1 | 131 | |
| 2 | 124 | |
| 3 | 116 | |
| 4 | 78 |
| Value | Count | Frequency (%) |
| 4710 | 1 | |
| 4410 | 1 | |
| 4315 | 2 | |
| 4114 | 2 | |
| 4094 | 1 |
| Distinct | 993 |
|---|---|
| Distinct (%) | 14.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 224.3560284 |
|---|---|
| Minimum | 0 |
| Maximum | 20990 |
| Zeros | 2119 |
| Zeros (%) | 30.1% |
| Memory size | 55.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 4 |
| Q3 | 23 |
| 95-th percentile | 1210.65 |
| Maximum | 20990 |
| Range | 20990 |
| Interquartile range (IQR) | 23 |
Descriptive statistics
| Standard deviation | 889.6368195 |
|---|---|
| Coefficient of variation (CV) | 3.965290463 |
| Kurtosis | 126.8628701 |
| Mean | 224.3560284 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 9.028850488 |
| Sum | 1581710 |
| Variance | 791453.6706 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 2119 | |
| 1 | 564 | 8.0% |
| 2 | 364 | 5.2% |
| 3 | 309 | 4.4% |
| 4 | 249 | 3.5% |
| 5 | 213 | 3.0% |
| 6 | 188 | 2.7% |
| 7 | 154 | 2.2% |
| 8 | 136 | 1.9% |
| 9 | 133 | 1.9% |
| Other values (983) | 2621 |
| Value | Count | Frequency (%) |
| 0 | 2119 | |
| 1 | 564 | 8.0% |
| 2 | 364 | 5.2% |
| 3 | 309 | 4.4% |
| 4 | 249 | 3.5% |
| Value | Count | Frequency (%) |
| 20990 | 1 | |
| 19013 | 1 | |
| 17404 | 1 | |
| 12003 | 1 | |
| 10960 | 1 |
| Distinct | 501 |
|---|---|
| Distinct (%) | 7.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40.02255319 |
|---|---|
| Minimum | 0 |
| Maximum | 3424 |
| Zeros | 3911 |
| Zeros (%) | 55.5% |
| Memory size | 55.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 4 |
| 95-th percentile | 260.1 |
| Maximum | 3424 |
| Range | 3424 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 131.5999655 |
|---|---|
| Coefficient of variation (CV) | 3.288145183 |
| Kurtosis | 96.8629404 |
| Mean | 40.02255319 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.099332142 |
| Sum | 282159 |
| Variance | 17318.55092 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 3911 | |
| 1 | 820 | 11.6% |
| 2 | 320 | 4.5% |
| 3 | 171 | 2.4% |
| 4 | 113 | 1.6% |
| 5 | 90 | 1.3% |
| 6 | 74 | 1.0% |
| 7 | 54 | 0.8% |
| 8 | 35 | 0.5% |
| 9 | 35 | 0.5% |
| Other values (491) | 1427 | 20.2% |
| Value | Count | Frequency (%) |
| 0 | 3911 | |
| 1 | 820 | 11.6% |
| 2 | 320 | 4.5% |
| 3 | 171 | 2.4% |
| 4 | 113 | 1.6% |
| Value | Count | Frequency (%) |
| 3424 | 1 | |
| 2139 | 1 | |
| 1636 | 1 | |
| 1618 | 1 | |
| 1430 | 1 |
| Distinct | 1044 |
|---|---|
| Distinct (%) | 14.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 215.0431206 |
|---|---|
| Minimum | 0 |
| Maximum | 4710 |
| Zeros | 126 |
| Zeros (%) | 1.8% |
| Memory size | 55.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 17 |
| median | 58 |
| Q3 | 184.75 |
| 95-th percentile | 1160.1 |
| Maximum | 4710 |
| Range | 4710 |
| Interquartile range (IQR) | 167.75 |
Descriptive statistics
| Standard deviation | 449.4723571 |
|---|---|
| Coefficient of variation (CV) | 2.0901499 |
| Kurtosis | 18.42703221 |
| Mean | 215.0431206 |
| Median Absolute Deviation (MAD) | 50 |
| Skewness | 3.91912765 |
| Sum | 1516054 |
| Variance | 202025.3998 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 128 | 1.8% |
| 2 | 127 | 1.8% |
| 0 | 126 | 1.8% |
| 14 | 124 | 1.8% |
| 12 | 120 | 1.7% |
| 3 | 118 | 1.7% |
| 10 | 110 | 1.6% |
| 18 | 106 | 1.5% |
| 19 | 97 | 1.4% |
| 11 | 97 | 1.4% |
| Other values (1034) | 5897 |
| Value | Count | Frequency (%) |
| 0 | 126 | |
| 1 | 128 | |
| 2 | 127 | |
| 3 | 118 | |
| 4 | 76 |
| Value | Count | Frequency (%) |
| 4710 | 1 | |
| 4315 | 1 | |
| 4241 | 2 | |
| 4094 | 1 | |
| 4054 | 2 |
| Distinct | 229 |
|---|---|
| Distinct (%) | 3.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.72865248 |
|---|---|
| Minimum | 0 |
| Maximum | 657 |
| Zeros | 4230 |
| Zeros (%) | 60.0% |
| Memory size | 55.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 3 |
| 95-th percentile | 77 |
| Maximum | 657 |
| Range | 657 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 39.97293011 |
|---|---|
| Coefficient of variation (CV) | 3.140389775 |
| Kurtosis | 50.57163221 |
| Mean | 12.72865248 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 6.004845077 |
| Sum | 89737 |
| Variance | 1597.835141 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 4230 | |
| 1 | 611 | 8.7% |
| 2 | 282 | 4.0% |
| 3 | 213 | 3.0% |
| 4 | 131 | 1.9% |
| 5 | 108 | 1.5% |
| 6 | 80 | 1.1% |
| 7 | 72 | 1.0% |
| 8 | 47 | 0.7% |
| 9 | 43 | 0.6% |
| Other values (219) | 1233 | 17.5% |
| Value | Count | Frequency (%) |
| 0 | 4230 | |
| 1 | 611 | 8.7% |
| 2 | 282 | 4.0% |
| 3 | 213 | 3.0% |
| 4 | 131 | 1.9% |
| Value | Count | Frequency (%) |
| 657 | 1 | |
| 529 | 1 | |
| 504 | 1 | |
| 485 | 1 | |
| 482 | 2 |
| Distinct | 65 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.289361702 |
|---|---|
| Minimum | 0 |
| Maximum | 278 |
| Zeros | 5308 |
| Zeros (%) | 75.3% |
| Memory size | 55.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 4 |
| Maximum | 278 |
| Range | 278 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 8.71965038 |
|---|---|
| Coefficient of variation (CV) | 6.762765147 |
| Kurtosis | 415.5921273 |
| Mean | 1.289361702 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 18.24681302 |
| Sum | 9090 |
| Variance | 76.03230276 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 5308 | |
| 1 | 704 | 10.0% |
| 2 | 347 | 4.9% |
| 3 | 216 | 3.1% |
| 4 | 137 | 1.9% |
| 5 | 77 | 1.1% |
| 6 | 55 | 0.8% |
| 7 | 29 | 0.4% |
| 8 | 28 | 0.4% |
| 9 | 19 | 0.3% |
| Other values (55) | 130 | 1.8% |
| Value | Count | Frequency (%) |
| 0 | 5308 | |
| 1 | 704 | 10.0% |
| 2 | 347 | 4.9% |
| 3 | 216 | 3.1% |
| 4 | 137 | 1.9% |
| Value | Count | Frequency (%) |
| 278 | 1 | |
| 252 | 1 | |
| 206 | 1 | |
| 200 | 1 | |
| 177 | 1 |
| Distinct | 42 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.6964539007 |
|---|---|
| Minimum | 0 |
| Maximum | 157 |
| Zeros | 5916 |
| Zeros (%) | 83.9% |
| Memory size | 55.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 4 |
| Maximum | 157 |
| Range | 157 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 3.957183443 |
|---|---|
| Coefficient of variation (CV) | 5.681902907 |
| Kurtosis | 587.181095 |
| Mean | 0.6964539007 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 20.30574123 |
| Sum | 4910 |
| Variance | 15.6593008 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=42)
| Value | Count | Frequency (%) |
| 0 | 5916 | |
| 1 | 399 | 5.7% |
| 2 | 228 | 3.2% |
| 3 | 149 | 2.1% |
| 4 | 100 | 1.4% |
| 5 | 64 | 0.9% |
| 6 | 34 | 0.5% |
| 8 | 24 | 0.3% |
| 7 | 20 | 0.3% |
| 9 | 17 | 0.2% |
| Other values (32) | 99 | 1.4% |
| Value | Count | Frequency (%) |
| 0 | 5916 | |
| 1 | 399 | 5.7% |
| 2 | 228 | 3.2% |
| 3 | 149 | 2.1% |
| 4 | 100 | 1.4% |
| Value | Count | Frequency (%) |
| 157 | 1 | |
| 102 | 1 | |
| 100 | 1 | |
| 97 | 1 | |
| 91 | 1 |
| Distinct | 24 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.2436879433 |
|---|---|
| Minimum | 0 |
| Maximum | 51 |
| Zeros | 6443 |
| Zeros (%) | 91.4% |
| Memory size | 55.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 51 |
| Range | 51 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.59715594 |
|---|---|
| Coefficient of variation (CV) | 6.554103244 |
| Kurtosis | 427.0720932 |
| Mean | 0.2436879433 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 17.57886772 |
| Sum | 1718 |
| Variance | 2.550907095 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=24)
| Value | Count | Frequency (%) |
| 0 | 6443 | |
| 1 | 321 | 4.6% |
| 2 | 113 | 1.6% |
| 3 | 64 | 0.9% |
| 4 | 37 | 0.5% |
| 5 | 14 | 0.2% |
| 6 | 12 | 0.2% |
| 8 | 12 | 0.2% |
| 10 | 6 | 0.1% |
| 7 | 6 | 0.1% |
| Other values (14) | 22 | 0.3% |
| Value | Count | Frequency (%) |
| 0 | 6443 | |
| 1 | 321 | 4.6% |
| 2 | 113 | 1.6% |
| 3 | 64 | 0.9% |
| 4 | 37 | 0.5% |
| Value | Count | Frequency (%) |
| 51 | 1 | |
| 46 | 2 | |
| 37 | 1 | |
| 28 | 1 | |
| 23 | 2 |
| Distinct | 14 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.1131914894 |
|---|---|
| Minimum | 0 |
| Maximum | 31 |
| Zeros | 6627 |
| Zeros (%) | 94.0% |
| Memory size | 55.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 31 |
| Range | 31 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.7268118907 |
|---|---|
| Coefficient of variation (CV) | 6.421082493 |
| Kurtosis | 624.7529455 |
| Mean | 0.1131914894 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 19.50712917 |
| Sum | 798 |
| Variance | 0.5282555244 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=14)
| Value | Count | Frequency (%) |
| 0 | 6627 | |
| 1 | 276 | 3.9% |
| 2 | 71 | 1.0% |
| 3 | 35 | 0.5% |
| 4 | 17 | 0.2% |
| 5 | 9 | 0.1% |
| 6 | 4 | 0.1% |
| 8 | 3 | < 0.1% |
| 7 | 2 | < 0.1% |
| 19 | 2 | < 0.1% |
| Other values (4) | 4 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 6627 | |
| 1 | 276 | 3.9% |
| 2 | 71 | 1.0% |
| 3 | 35 | 0.5% |
| 4 | 17 | 0.2% |
| Value | Count | Frequency (%) |
| 31 | 1 | |
| 19 | 2 | |
| 12 | 1 | |
| 10 | 1 | |
| 9 | 1 |
status_link
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 55.2 KiB |
| 0 | |
|---|---|
| 1 | 63 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 7050 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
| Value | Count | Frequency (%) |
| 0 | 6987 | |
| 1 | 63 | 0.9% |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 0 | 6987 | |
| 1 | 63 | 0.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 6987 | |
| 1 | 63 | 0.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 7050 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 6987 | |
| 1 | 63 | 0.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 7050 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 6987 | |
| 1 | 63 | 0.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7050 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 6987 | |
| 1 | 63 | 0.9% |
status_photo
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 55.2 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 7050 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 1 |
| 5th row | 1 |
| Value | Count | Frequency (%) |
| 1 | 4288 | |
| 0 | 2762 |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 1 | 4288 | |
| 0 | 2762 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 4288 | |
| 0 | 2762 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 7050 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 1 | 4288 | |
| 0 | 2762 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 7050 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 1 | 4288 | |
| 0 | 2762 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7050 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 1 | 4288 | |
| 0 | 2762 |
status_status
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 55.2 KiB |
| 0 | |
|---|---|
| 1 | 365 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 7050 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
| Value | Count | Frequency (%) |
| 0 | 6685 | |
| 1 | 365 | 5.2% |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 0 | 6685 | |
| 1 | 365 | 5.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 6685 | |
| 1 | 365 | 5.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 7050 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 6685 | |
| 1 | 365 | 5.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 7050 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 6685 | |
| 1 | 365 | 5.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7050 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 6685 | |
| 1 | 365 | 5.2% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| status_id | num_reactions | num_comments | num_shares | num_likes | num_loves | num_wows | num_hahas | num_sads | num_angrys | status_link | status_photo | status_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 246675545449582_1649696485147474 | 529 | 512 | 262 | 432 | 92 | 3 | 1 | 1 | 0 | 0 | 0 | 0 |
| 1 | 246675545449582_1649426988507757 | 150 | 0 | 0 | 150 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2 | 246675545449582_1648730588577397 | 227 | 236 | 57 | 204 | 21 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| 3 | 246675545449582_1648576705259452 | 111 | 0 | 0 | 111 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 4 | 246675545449582_1645700502213739 | 213 | 0 | 0 | 204 | 9 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 5 | 246675545449582_1645650162218773 | 217 | 6 | 0 | 211 | 5 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| 6 | 246675545449582_1645564175560705 | 503 | 614 | 72 | 418 | 70 | 10 | 2 | 0 | 3 | 0 | 0 | 0 |
| 7 | 246675545449582_1644824665634656 | 295 | 453 | 53 | 260 | 32 | 1 | 1 | 0 | 1 | 0 | 0 | 0 |
| 8 | 246675545449582_1644655795651543 | 203 | 1 | 0 | 198 | 5 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 9 | 246675545449582_1638788379571618 | 170 | 9 | 1 | 167 | 3 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
Last rows
| status_id | num_reactions | num_comments | num_shares | num_likes | num_loves | num_wows | num_hahas | num_sads | num_angrys | status_link | status_photo | status_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7040 | 1050855161656896_1063071050435307 | 93 | 26 | 34 | 90 | 3 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 7041 | 1050855161656896_1062020473873698 | 9 | 0 | 0 | 7 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 7042 | 1050855161656896_1061944223881323 | 4 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 7043 | 1050855161656896_1061918183883927 | 196 | 2 | 3 | 195 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 7044 | 1050855161656896_1061906620551750 | 86 | 0 | 0 | 86 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 7045 | 1050855161656896_1061863470556065 | 89 | 0 | 0 | 89 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 7046 | 1050855161656896_1061334757275603 | 16 | 0 | 0 | 14 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 7047 | 1050855161656896_1060126464063099 | 2 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 7048 | 1050855161656896_1058663487542730 | 351 | 12 | 22 | 349 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 7049 | 1050855161656896_1050858841656528 | 17 | 0 | 0 | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
Most frequent
| status_id | num_reactions | num_comments | num_shares | num_likes | num_loves | num_wows | num_hahas | num_sads | num_angrys | status_link | status_photo | status_status | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 246675545449582_326883450762124 | 211 | 2 | 0 | 211 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 |
| 1 | 246675545449582_429583263825475 | 537 | 16 | 1 | 537 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 |
| 2 | 819700534875473_1000607730118085 | 1704 | 21 | 3 | 1685 | 15 | 2 | 2 | 0 | 0 | 0 | 1 | 0 | 2 |
| 3 | 819700534875473_1001982519980606 | 255 | 7 | 4 | 249 | 6 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 |
| 4 | 819700534875473_1002372733274918 | 376 | 20 | 3 | 354 | 19 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 2 |
| 5 | 819700534875473_951614605017398 | 985 | 71 | 42 | 962 | 5 | 16 | 2 | 0 | 0 | 0 | 1 | 0 | 2 |
| 6 | 819700534875473_953048221540703 | 1985 | 39 | 21 | 1961 | 11 | 12 | 0 | 1 | 0 | 0 | 1 | 0 | 2 |
| 7 | 819700534875473_954387151406810 | 186 | 15 | 1 | 172 | 3 | 11 | 0 | 0 | 0 | 0 | 1 | 0 | 2 |
| 8 | 819700534875473_955149101330615 | 114 | 6 | 1 | 108 | 3 | 3 | 0 | 0 | 0 | 0 | 1 | 0 | 2 |
| 9 | 819700534875473_955743124604546 | 879 | 165 | 18 | 867 | 4 | 8 | 0 | 0 | 0 | 0 | 1 | 0 | 2 |